As there is no unique id or primary key to identify each row, inserting row number as id assuming that each row represents one unqiue person

Open credit lines and total credit lines are highly related to each other for both individual and joint application types

The loan amount due to debt_consolidation is highest among other type for the individual application as compared to joint.

Total loan amount for Califonia State is highest and lowest North Dakota.

People with mortgage homes have the highest loan amount

93% of the credit lines are in current status

Categorical Variables:-

Find out the relationship between categorical variable and dependent feature SalesPrice:-

Let's Find The Missing values:-

Dropping columns with more than 70% missing values:-

Fill the missing values:-

Correlation of Data Set

Encoding Categorical Variables:-

Due to time limit and other activities I was able to perform analysis using the random combination of varibale, if sufficient time would have been given, I would have first understand the business processes and the variables of interest that are key factors in running the business. Secondly, most of the time I would like to devote in understanding the data and cleaning it. Then according to the business requirement, I would deploy models and repeat the processes.